Learning the Reward Model of Dialogue POMDPs from Data
نویسندگان
چکیده
Spoken language communication between human and machines has become a challenge in research and technology. In particular, enabling the health care robots with spoken language interface is of great attention. Recently due to uncertainty characterizing dialogues, there has been interest for modelling the dialogue manager of spoken dialogue systems using Partially Observable Markov Decision Processes (POMDPs). In this context, we would like to learn the reward model of dialogue POMDPs from expert’s data. In a previous paper, we used an unsupervised learning method for learning the states, as well as the transition and observation functions of the dialogue POMDPs based on human-human dialogues. Continuing our objective of learning the components of dialogue POMDPs from data, here we introduce a novel inverse reinforcement learning (IRL) algorithm for learning the reward function of the dialogue POMDP model. Based on the introduced method, and from an available corpus of data, we construct a dialogue POMDP. Then, the learned dialogue policies, based on the learned POMDP, are evaluated. Our empirical evaluation shows that the performance of the learned POMDP is higher than expert performance in non, low, and medium noise levels, but the high noise level. At the end, current limitations and future directions are addressed.
منابع مشابه
Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems
This paper presents a novel algorithm for learning parameters in statistical dialogue systems which are modelled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy which selects the system’s responses based on the inferred state; and a reward function which speci...
متن کاملLearning Observation Models for Dialogue POMDPs
The SmartWheeler project aims at developing an intelligent wheelchair for handicapped people. In this paper, we model the dialogue manager of SmartWheeler in MDP and POMDP frameworks using its collected dialogues. First, we learn the model components of the dialogue MDP based on our previous works. Then, we extend the dialogue MDP to a dialogue POMDP, by proposing two observation models learned...
متن کاملControlling Listening-oriented Dialogue using Partially Observable Markov Decision Processes
This paper investigates how to automatically create a dialogue control component of a listening agent to reduce the current high cost of manually creating such components. We collected a large number of listening-oriented dialogues with their user satisfaction ratings and used them to create a dialogue control component using partially observable Markov decision processes (POMDPs), which can le...
متن کاملDialogue Strategy Learning in Healthcare: A Systematic Approach for Learning Dialogue Models from Data
We aim to build dialogue agents that optimize the dialogue strategy, specifically through learning the dialogue model components from dialogue data. In this paper, we describe our current research on automatically learning dialogue strategies in the healthcare domain. We go through our systematic approach of learning dialogue model components from data, specifically user intents and the user mo...
متن کاملDialogue Control Algorithm for Ambient Intelligence based on Partially Observable Markov Decision Processes
From the viewpoint of supporting users’ natural dialogue communication with conversational agents, their dialogue management has to determine any agent’s action, based on probabilistic methods derived from noisy data through sensors in the real world. We believe unique Partially Observable Markov Decision Processes (POMDPs) should be applied to such action control systems. The agents must flexi...
متن کامل